LYAM++ results for OAEI 2015
نویسندگان
چکیده
The paper presents a novel technique for aligning cross-lingual ontologies that does not rely on machine translation, but uses the large multilingual semantic network BabelNet as a source of background knowledge. In addition, our approach applies a novel orchestration of the components of the matching workflow. We demonstrate that our method outperforms considerably the best techniques in the state-of-the-art. 1 Presentation of the system In spite of the considerable advance that has been made in the field of ontology matching recently, many questions remain open [1]. The current work addresses the challenge of using background knowledge with a focus on aligning cross-lingual ontologies, i.e., defined in different natural languages [2]. Indeed, considering multilingual and cross-lingual information is becoming more and more important, in view particularly of the growing number of contentcreating non-English users and the clear demand of cross-language interoperability. In the context of the web of data, it is important to propose procedures for linking vocabularies across natural languages, in order to foster the creation of a veritable global information network. The use of different natural languages in the concepts and relations labeling process is becoming an important source of ontology heterogeneity. The methods that have been proposed to deal with it most commonly rely on automatic translation of labels to a single target language [3] or apply machine learning techniques [2]. However, machine translation tolerates low precision levels and machine learning methods require large training corpus that is rarely available in an ontology matching scenario. An inherent problem of translation is that there is often a lack of exact one-to-one correspondence between the terms in different natural languages. 1.1 State, purpose, general statement We present LYAM++ (Yet Another Matcher Light), a fully automatic crosslingual ontology matching system that does not rely on machine translation. Instead, we make use of the openly available general-purpose multilingual semantic network BabelNet in order to recreate the missing semantic context in 1 http://babelnet.org/ 2 N. Tigrine, Z. Bellahsene, K. Todorov Fig. 1: The processing pipeline of LYAM++. the matching process. Another original feature of our approach is the choice of orchestration of the matching workflow. Our experiments on the MultiFarm benchmark data show that (1) our method outperforms the best approaches in the current state-of-the-art and (2) the novel workflow orchestration provides better results compared to the classical one. 1.2 Specific techniques used The workflow of LYAM++ is given in Fig. 1. We take as an input a source ontology S, given in a natural language lS and a target ontology T , given in a language lT . The overall processes consists of four main components: a terminological multilingual matcher, a mapping selection module and, finally, a structural matcher. One of the original contributions of this work is the choice of orchestration of these components. Indeed, the places of the mapping selection module and the structural matcher are reversed in the existing OM tools [4]. However, we wanted to ensure that we feed only good quality mappings to the structural matcher, therefore we decided to filter the discovered correspondences right after producing the initial alignment. This decision is supported experimentally in the following section. The multilingual terminological matching module, the second contribution described in this paper, acts on the one hand as a preprocessing component and, on the other hand – as a light-weight terminological matcher between crosslingual labels. We start by splitting the elements of each ontology in three groups: labels of classes, labels of object properties and labels of data object properties (in colors blue, black and red in the figure), since these groups of elements are to be aligned separately. A standard preprocessing procedure is applied on these sets of labels, comprising character normalization, stop-words filtering, tokenization and lemmatization. The tokens of the elements of T are then aligned to BabelNet. At first, every token of a given label s in S is enriched by related 2 http://web.informatik.uni-mannheim.de/multifarm/ Cross-lingual Ontology Matching 3 terms and synonyms from BabelNet and all of these terms are represented in the language lT , which makes these terms comparable to the tokens of the labels in T . A simple similarity evaluation by the help of the Jaccard coefficient selects the term in each set of related terms corresponding to a given token from s that has the highest score with respect to every token in each label of T . This helps to restitute the label s in the language lT . Finally, the labels in each group of S and T , seen as sets of tokens, are compared by using the Soft TFIDF similarity measure [5], which produces an intermediate terminological alignment. The three remaining components are standard OM modules [4], although ordered in a new manner. The Mapping selection is a module that transforms the initial 1 to many mapping to a 1:1 alignment based on the principle of iteratively retaining the pairs of concepts with maximal value of similarity. Finally, the structural matcher component filters the trustworthy pairs of aligned concepts by looking at the similarity values produced for their parents and their children in the ontology hierarchies. 1.3 Link to the system and parameters file The system is not yet available online. The reason for that is that it depends heavily on the use of BabelNet, which is a protected source. We are working on implementing a sharable version of LYAM++ making use of different open access background knowledge sources. 1.4 Link to the set of provided alignments (in align format) The alignments produced by LYAM++ for this year’s Multifarm track can be found under the following link: http://www.lirmm.fr/benellefi/Lyam++.rar
منابع مشابه
InsMT+ results for OAEI 2015 instance matching
The InsMT+ is an improved version of InsMT system participated at OAEI 2014. The InsMT+ an automatic instance matching system which consists in identifying the instances that describe the same real-world objects. The InsMT+ applies different string-based matchers with a local filter. This is the second participation of our system and we have improved somehow the results obtained by the previous...
متن کاملDKP-AOM: results for OAEI 2016
In this paper, we present the results obtained by our DKP-AOM system within the OAEI 2016 campaign. DKPAOM is an ontology merging tool designed to merge heterogeneous ontologies. In OAEI, we have participated with its ontology mapping component which serves as a basic module capable of matching large scale ontologies before their merging. This is our second successful participation in the OAEI ...
متن کاملAML results for OAEI 2015
AgreementMakerLight (AML) is an automated ontology matching system based primarily on element-level matching and on the use of external resources as background knowledge. This paper describes its configuration for the OAEI 2015 competition and discusses its results. For this OAEI edition, we focused mainly on the Interactive Matching track due to its expansion, as handling user interactions on ...
متن کاملSTRIM results for OAEI 2015 instance matching evaluation
The interest of instance matching grows everyday with the emergence of linked data. This task is very necessary to interlink semantically data together in order to be reused and shared. In this paper, we introduce STRIM, an automatic instance matching tool designed to identify the instances that describe the same real-world objects. The STRIM system participates for the first time at OAEI 2015 ...
متن کاملCroLOM results for OAEI 2017: summary of cross-lingual ontology matching systems results at OAEI
This paper presents the results obtained in the OAEI 2017 campaign by our ontology matching system CroLOM. CroLOM is an automatic system especially designed for aligning multilingual ontologies. This is our second participation with CroLOM in the OAEI and the results have so far been positive.
متن کامل